Structural relaxations
Contents
Structural relaxations#
A structural relaxation or structure optimization is the process of iteratively updating atom positions to find the atom positions that minimize the energy of the structure. Standard optimization methods are used in structural relaxations — below we use the Limited-Memory Broyden–Fletcher–Goldfarb–Shanno (LBFGS) algorithm. The step number, time, energy, and force max are printed at each optimization step. Each step is considered one example because it provides all the information we need to train models for the S2EF task and the entire set of steps is referred to as a trajectory. Visualizing intermediate structures or viewing the entire trajectory can be illuminating to understand what is physically happening and to look for problems in the simulation, especially when we run ML-driven relaxations. Common problems one may look out for - atoms excessively overlapping/colliding with each other and atoms flying off into random directions.
import os
import ase.io
import numpy as np
from ase import Atoms
from ase.build import add_adsorbate, fcc100, molecule
from ase.calculators.emt import EMT
from ase.constraints import FixAtoms
from ase.io import extxyz
from ase.io.trajectory import Trajectory
from ase.optimize import LBFGS
from ase.visualize.plot import plot_atoms
# This cell sets up and runs a structural relaxation
# of a propane (C3H8) adsorbate on a copper (Cu) surface
adslab = fcc100("Cu", size=(3, 3, 3))
adsorbate = molecule("C3H8")
add_adsorbate(adslab, adsorbate, 3, offset=(1, 1)) # adslab = adsorbate + slab
# tag all slab atoms below surface as 0, surface as 1, adsorbate as 2
tags = np.zeros(len(adslab))
tags[18:27] = 1
tags[27:] = 2
adslab.set_tags(tags)
# Fixed atoms are prevented from moving during a structure relaxation.
# We fix all slab atoms beneath the surface.
cons = FixAtoms(indices=[atom.index for atom in adslab if (atom.tag == 0)])
adslab.set_constraint(cons)
adslab.center(vacuum=13.0, axis=2)
adslab.set_pbc(True)
adslab.set_calculator(EMT())
os.makedirs("data", exist_ok=True)
# Define structure optimizer - LBFGS. Run for 100 steps,
# or if the max force on all atoms (fmax) is below 0 ev/A.
# fmax is typically set to 0.01-0.05 eV/A,
# for this demo however we run for the full 100 steps.
dyn = LBFGS(adslab, trajectory="data/toy_c3h8_relax.traj")
dyn.run(fmax=0, steps=100)
traj = ase.io.read("data/toy_c3h8_relax.traj", ":")
# convert traj format to extxyz format (used by OC20 dataset)
columns = ["symbols", "positions", "move_mask", "tags"]
with open("data/toy_c3h8_relax.extxyz", "w") as f:
extxyz.write_xyz(f, traj, columns=columns)
Step Time Energy fmax
*Force-consistent energies used in optimization.
LBFGS: 0 20:39:06 15.804700* 6.7764
LBFGS: 1 20:39:07 12.190607* 4.3232
LBFGS: 2 20:39:07 10.240169* 2.2655
LBFGS: 3 20:39:07 9.779223* 0.9372
LBFGS: 4 20:39:07 9.671525* 0.7702
LBFGS: 5 20:39:07 9.574461* 0.6635
LBFGS: 6 20:39:07 9.537502* 0.5718
LBFGS: 7 20:39:07 9.516673* 0.4466
LBFGS: 8 20:39:07 9.481330* 0.4611
LBFGS: 9 20:39:07 9.462255* 0.2931
LBFGS: 10 20:39:07 9.448937* 0.2490
LBFGS: 11 20:39:07 9.433813* 0.2371
LBFGS: 12 20:39:07 9.418884* 0.2602
LBFGS: 13 20:39:07 9.409649* 0.2532
LBFGS: 14 20:39:07 9.404838* 0.1624
LBFGS: 15 20:39:07 9.401753* 0.1823
LBFGS: 16 20:39:07 9.397314* 0.2592
LBFGS: 17 20:39:07 9.387947* 0.3450
LBFGS: 18 20:39:07 9.370825* 0.4070
LBFGS: 19 20:39:07 9.342222* 0.4333
LBFGS: 20 20:39:07 9.286822* 0.5002
LBFGS: 21 20:39:07 9.249910* 0.5241
LBFGS: 22 20:39:07 9.187179* 0.5120
LBFGS: 23 20:39:07 9.124811* 0.5718
LBFGS: 24 20:39:07 9.066185* 0.5409
LBFGS: 25 20:39:07 9.000116* 1.0798
LBFGS: 26 20:39:07 8.893632* 0.7528
LBFGS: 27 20:39:07 8.845939* 0.3321
LBFGS: 28 20:39:07 8.815173* 0.2512
LBFGS: 29 20:39:07 8.808721* 0.2143
LBFGS: 30 20:39:07 8.794643* 0.1546
LBFGS: 31 20:39:07 8.789162* 0.2014
LBFGS: 32 20:39:07 8.782320* 0.1755
LBFGS: 33 20:39:07 8.780394* 0.1037
LBFGS: 34 20:39:07 8.778410* 0.1076
LBFGS: 35 20:39:07 8.775079* 0.1797
LBFGS: 36 20:39:07 8.766987* 0.3334
LBFGS: 37 20:39:07 8.750249* 0.5307
LBFGS: 38 20:39:08 8.725928* 0.6851
LBFGS: 39 20:39:08 8.702312* 0.5823
LBFGS: 40 20:39:08 8.661515* 0.3996
LBFGS: 41 20:39:08 8.643432* 0.5585
LBFGS: 42 20:39:08 8.621201* 0.3673
LBFGS: 43 20:39:08 8.614414* 0.1394
LBFGS: 44 20:39:08 8.610785* 0.1372
LBFGS: 45 20:39:08 8.608134* 0.1464
LBFGS: 46 20:39:08 8.604928* 0.1196
LBFGS: 47 20:39:08 8.599151* 0.1354
LBFGS: 48 20:39:08 8.594063* 0.1479
LBFGS: 49 20:39:08 8.589493* 0.1538
LBFGS: 50 20:39:08 8.587274* 0.0885
LBFGS: 51 20:39:08 8.584633* 0.0938
LBFGS: 52 20:39:08 8.580239* 0.1409
LBFGS: 53 20:39:08 8.572938* 0.2543
LBFGS: 54 20:39:08 8.563343* 0.2919
LBFGS: 55 20:39:08 8.554117* 0.1966
LBFGS: 56 20:39:08 8.547597* 0.1291
LBFGS: 57 20:39:08 8.542086* 0.1280
LBFGS: 58 20:39:08 8.535432* 0.0982
LBFGS: 59 20:39:08 8.533622* 0.1277
LBFGS: 60 20:39:08 8.527487* 0.1167
LBFGS: 61 20:39:08 8.523863* 0.1218
LBFGS: 62 20:39:08 8.519229* 0.1305
LBFGS: 63 20:39:08 8.515424* 0.1019
LBFGS: 64 20:39:08 8.511240* 0.2122
LBFGS: 65 20:39:08 8.507967* 0.2666
LBFGS: 66 20:39:08 8.503903* 0.2377
LBFGS: 67 20:39:08 8.497575* 0.1623
LBFGS: 68 20:39:08 8.485434* 0.2022
LBFGS: 69 20:39:08 8.466738* 0.2159
LBFGS: 70 20:39:08 8.467607* 0.3348
LBFGS: 71 20:39:08 8.454037* 0.1063
LBFGS: 72 20:39:08 8.448980* 0.1197
LBFGS: 73 20:39:09 8.446550* 0.0992
LBFGS: 74 20:39:09 8.444705* 0.0562
LBFGS: 75 20:39:09 8.443403* 0.0388
LBFGS: 76 20:39:09 8.442646* 0.0548
LBFGS: 77 20:39:09 8.442114* 0.0614
LBFGS: 78 20:39:09 8.440960* 0.0588
LBFGS: 79 20:39:09 8.439820* 0.0482
LBFGS: 80 20:39:09 8.438600* 0.0513
LBFGS: 81 20:39:09 8.437429* 0.0541
LBFGS: 82 20:39:09 8.435695* 0.0672
LBFGS: 83 20:39:09 8.431957* 0.0857
LBFGS: 84 20:39:09 8.423485* 0.1332
LBFGS: 85 20:39:09 8.413846* 0.2078
LBFGS: 86 20:39:09 8.404849* 0.1787
LBFGS: 87 20:39:09 8.385339* 0.1690
LBFGS: 88 20:39:09 8.386849* 0.1876
LBFGS: 89 20:39:09 8.371078* 0.1181
LBFGS: 90 20:39:09 8.368801* 0.0942
LBFGS: 91 20:39:09 8.366226* 0.0670
LBFGS: 92 20:39:09 8.361680* 0.0550
LBFGS: 93 20:39:09 8.360631* 0.0473
LBFGS: 94 20:39:09 8.359692* 0.0242
LBFGS: 95 20:39:09 8.359361* 0.0155
LBFGS: 96 20:39:09 8.359163* 0.0143
LBFGS: 97 20:39:09 8.359102* 0.0156
LBFGS: 98 20:39:09 8.359048* 0.0155
LBFGS: 99 20:39:09 8.358986* 0.0142
LBFGS: 100 20:39:09 8.358921* 0.0132
/home/runner/micromamba-root/envs/buildenv/lib/python3.9/site-packages/ase/io/extxyz.py:301: UserWarning: Skipping unhashable information adsorbate_info
warnings.warn('Skipping unhashable information '
Reading a trajectory#
identifier = "toy_c3h8_relax.extxyz"
# the `index` argument corresponds to what frame of the trajectory to read in, specifiying ":" reads in the full trajectory.
traj = ase.io.read(f"data/{identifier}", index=":")
Viewing a trajectory#
Below we visualize the initial, middle, and final steps in the structural relaxation trajectory from above. Copper atoms in the surface are colored orange, the propane adsorbate on the surface has grey colored carbon atoms and white colored hydrogen atoms. The adsorbate’s structure changes during the simulation and you can see how it relaxes on the surface. In this case, the relaxation looks normal; however, there can be instances where the adsorbate flies away (desorbs) from the surface or the adsorbate can break apart (dissociation), which are hard to detect without visualization.
Tip
Visualizations can be used as a quick sanity check to ensure the initial system is set up correctly and there are no major issues with the simulation!
import matplotlib.pyplot as plt
from ase.visualize.plot import animate
from matplotlib import rc
anim = animate(traj, radii=0.8, rotation=("-75x, 45y, 10z"))
rc("animation", html="jshtml")
anim
Warning
Notice that this relaxation looks quite strange - the adsorbate starts as a propane molecule but then sort of falls apart over the course of the relaxation.
This is because we’re using the EMT calculator, which works ok for some simple metal surfaces but is basically just a toy model for any organic atoms (C, H, O, etc). We’re using the EMT calculator here because it’s extremely fast for demonstration purposes!
Data contents #
Here we take a closer look at what information is contained within these trajectories.
i_structure = traj[0]
i_structure
Atoms(symbols='Cu27C3H8', pbc=True, cell=[7.65796644025031, 7.65796644025031, 33.266996999999996], energies=..., forces=..., tags=..., constraint=FixAtoms(indices=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]), calculator=SinglePointCalculator(...))
Atomic numbers#
numbers = i_structure.get_atomic_numbers()
print(numbers)
[29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29
29 29 29 6 6 6 1 1 1 1 1 1 1 1]
Atomic symbols#
symbols = np.array(i_structure.get_chemical_symbols())
print(symbols)
['Cu' 'Cu' 'Cu' 'Cu' 'Cu' 'Cu' 'Cu' 'Cu' 'Cu' 'Cu' 'Cu' 'Cu' 'Cu' 'Cu'
'Cu' 'Cu' 'Cu' 'Cu' 'Cu' 'Cu' 'Cu' 'Cu' 'Cu' 'Cu' 'Cu' 'Cu' 'Cu' 'C' 'C'
'C' 'H' 'H' 'H' 'H' 'H' 'H' 'H' 'H']
Unit cell#
The unit cell is the volume containing our system of interest. Express as a 3x3 array representing the directional vectors that make up the volume. Illustrated as the dashed box in the above visuals.
cell = np.array(i_structure.cell)
print(cell)
[[ 7.65796644 0. 0. ]
[ 0. 7.65796644 0. ]
[ 0. 0. 33.266997 ]]
Periodic boundary conditions (PBC)#
x,y,z boolean representing whether a unit cell repeats in the corresponding directions. The OC20 dataset sets this to [True, True, True], with a large enough vacuum layer above the surface such that a unit cell does not see itself in the z direction. Although the original structure shown above is what get’s passed into our models, the presence of PBC allows it to effectively repeat infinitely in the x and y directions. Below we visualize the same structure with a periodicity of 2 in all directions, what the model may effectively see.
pbc = i_structure.pbc
print(pbc)
[ True True True]
fig, ax = plt.subplots(1, 3)
labels = ["initial", "middle", "final"]
for i in range(3):
ax[i].axis("off")
ax[i].set_title(labels[i])
ase.visualize.plot.plot_atoms(
traj[0].repeat((2, 2, 1)), ax[0], radii=0.8, rotation=("-75x, 45y, 10z")
)
ase.visualize.plot.plot_atoms(
traj[50].repeat((2, 2, 1)), ax[1], radii=0.8, rotation=("-75x, 45y, 10z")
)
ase.visualize.plot.plot_atoms(
traj[-1].repeat((2, 2, 1)), ax[2], radii=0.8, rotation=("-75x, 45y, 10z")
)
<AxesSubplot:title={'center':'final'}>
Fixed atoms constraint#
In reality, surfaces contain many, many more atoms beneath what we’ve illustrated as the surface. At an infinite depth, these subsurface atoms would look just like the bulk structure. We approximate a true surface by fixing the subsurface atoms into their “bulk” locations. This ensures that they cannot move at the “bottom” of the surface. If they could, this would throw off our calculations. Consistent with the above, we fix all atoms with tags=0, and denote them as “fixed”. All other atoms are considered “free”.
cons = i_structure.constraints[0]
print(cons, "\n")
# indices of fixed atoms
indices = cons.index
print(indices, "\n")
# fixed atoms correspond to tags = 0
print(tags[indices])
FixAtoms(indices=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17])
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Adsorption energy#
The energy of the system is one of the properties of interest in the OC20 dataset. It’s important to note that absolute energies provide little value to researchers and must be referenced properly to be useful. The OC20 dataset references all it’s energies to the bare slab + gas references to arrive at adsorption energies. Adsorption energies are important in studying catalysts and their corresponding reaction rates. In addition to the structure relaxations of the OC20 dataset, bare slab and gas (N2, H2, H2O, CO) relaxations were carried out with DFT in order to calculate adsorption energies.
final_structure = traj[-1]
relaxed_energy = final_structure.get_potential_energy()
print(f"Relaxed absolute energy = {relaxed_energy} eV")
# Corresponding raw slab used in original adslab (adsorbate+slab) system.
raw_slab = fcc100("Cu", size=(3, 3, 3))
raw_slab.set_calculator(EMT())
raw_slab_energy = raw_slab.get_potential_energy()
print(f"Raw slab energy = {raw_slab_energy} eV")
adsorbate = Atoms("C3H8").get_chemical_symbols()
# For clarity, we define arbitrary gas reference energies here.
# A more detailed discussion of these calculations can be found in the corresponding paper's SI.
gas_reference_energies = {"H": 0.3, "O": 0.45, "C": 0.35, "N": 0.50}
adsorbate_reference_energy = 0
for ads in adsorbate:
adsorbate_reference_energy += gas_reference_energies[ads]
print(f"Adsorbate reference energy = {adsorbate_reference_energy} eV\n")
adsorption_energy = relaxed_energy - raw_slab_energy - adsorbate_reference_energy
print(f"Adsorption energy: {adsorption_energy} eV")
Relaxed absolute energy = 8.358921451406472 eV
Raw slab energy = 8.127167122751231 eV
Adsorbate reference energy = 3.4499999999999993 eV
Adsorption energy: -3.218245671344759 eV
Plot energy profile of toy trajectory#
Plotting the energy profile of our trajectory is a good way to ensure nothing strange has occured. We expect to see a decreasing monotonic function. If the energy is consistently increasing or there’s multiple large spikes this could be a sign of some issues in the optimization. This is particularly useful for when analyzing ML-driven relaxations and whether they make general physical sense.
import pandas as pd
import plotly.express as px
energies = [
image.get_potential_energy() - raw_slab_energy - adsorbate_reference_energy
for image in traj
]
df = pd.DataFrame({"Relaxation Step": range(len(energies)), "Energy [eV]": energies})
fig = px.line(df, x="Relaxation Step", y="Energy [eV]")
fig.show()
Force#
Forces are another important property of the OC20 dataset. Unlike datasets like QM9 which contain only ground state properties, the OC20 dataset contains per-atom forces necessary to carry out atomistic simulations. Physically, forces are the negative gradient of energy w.r.t atomic positions: \(F = -\frac{dE}{dx}\). Although not mandatory (depending on the application), maintaining this energy-force consistency is important for models that seek to make predictions on both properties.
The “apply_constraint” argument controls whether to apply system constraints to the forces. In the OC20 dataset, this controls whether to return forces for fixed atoms (apply_constraint=False) or return 0s (apply_constraint=True).
# Returning forces for all atoms - regardless of whether "fixed" or "free"
i_structure.get_forces(apply_constraint=False)
array([[-1.07900000e-05, -3.80000000e-06, 1.13560540e-01],
[-0.00000000e+00, -4.29200000e-05, 1.13302410e-01],
[ 1.07900000e-05, -3.80000000e-06, 1.13560540e-01],
[-1.84600000e-05, 0.00000000e+00, 1.13543430e-01],
[ 0.00000000e+00, 0.00000000e+00, 1.13047800e-01],
[ 1.84600000e-05, 0.00000000e+00, 1.13543430e-01],
[-1.07900000e-05, 3.80000000e-06, 1.13560540e-01],
[-0.00000000e+00, 4.29200000e-05, 1.13302410e-01],
[ 1.07900000e-05, 3.80000000e-06, 1.13560540e-01],
[-1.10430500e-02, -2.53094000e-03, -4.84573700e-02],
[ 1.10430500e-02, -2.53094000e-03, -4.84573700e-02],
[ 0.00000000e+00, -2.20890000e-04, -2.07827000e-03],
[-1.10430500e-02, 2.53094000e-03, -4.84573700e-02],
[ 1.10430500e-02, 2.53094000e-03, -4.84573700e-02],
[-0.00000000e+00, 2.20890000e-04, -2.07827000e-03],
[-3.49808000e-03, -0.00000000e+00, -7.85544000e-03],
[ 3.49808000e-03, -0.00000000e+00, -7.85544000e-03],
[-0.00000000e+00, -0.00000000e+00, -5.97640000e-04],
[-3.18144370e-01, -2.36420450e-01, -3.97089230e-01],
[ 0.00000000e+00, -2.18895316e+00, -2.74768262e+00],
[ 3.18144370e-01, -2.36420450e-01, -3.97089230e-01],
[-5.65980520e-01, 0.00000000e+00, -6.16046990e-01],
[ 0.00000000e+00, 0.00000000e+00, -4.47152822e+00],
[ 5.65980520e-01, -0.00000000e+00, -6.16046990e-01],
[-3.18144370e-01, 2.36420450e-01, -3.97089230e-01],
[ 0.00000000e+00, 2.18895316e+00, -2.74768262e+00],
[ 3.18144370e-01, 2.36420450e-01, -3.97089230e-01],
[-0.00000000e+00, -0.00000000e+00, -3.96835355e+00],
[-0.00000000e+00, -3.64190926e+00, 5.71458646e+00],
[ 0.00000000e+00, 3.64190926e+00, 5.71458646e+00],
[-2.18178516e+00, 0.00000000e+00, 1.67589182e+00],
[ 2.18178516e+00, 0.00000000e+00, 1.67589182e+00],
[-0.00000000e+00, 2.46333681e+00, 1.78299828e+00],
[ 0.00000000e+00, -2.46333681e+00, 1.78299828e+00],
[ 6.18714050e+00, 2.26336330e-01, -5.99485570e-01],
[-6.18714050e+00, 2.26336330e-01, -5.99485570e-01],
[-6.18714050e+00, -2.26336330e-01, -5.99485570e-01],
[ 6.18714050e+00, -2.26336330e-01, -5.99485570e-01]])
# Applying the fixed atoms constraint to the forces
i_structure.get_forces(apply_constraint=True)
array([[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[-0.31814437, -0.23642045, -0.39708923],
[ 0. , -2.18895316, -2.74768262],
[ 0.31814437, -0.23642045, -0.39708923],
[-0.56598052, 0. , -0.61604699],
[ 0. , 0. , -4.47152822],
[ 0.56598052, -0. , -0.61604699],
[-0.31814437, 0.23642045, -0.39708923],
[ 0. , 2.18895316, -2.74768262],
[ 0.31814437, 0.23642045, -0.39708923],
[-0. , -0. , -3.96835355],
[-0. , -3.64190926, 5.71458646],
[ 0. , 3.64190926, 5.71458646],
[-2.18178516, 0. , 1.67589182],
[ 2.18178516, 0. , 1.67589182],
[-0. , 2.46333681, 1.78299828],
[ 0. , -2.46333681, 1.78299828],
[ 6.1871405 , 0.22633633, -0.59948557],
[-6.1871405 , 0.22633633, -0.59948557],
[-6.1871405 , -0.22633633, -0.59948557],
[ 6.1871405 , -0.22633633, -0.59948557]])